Return Home

Sales prediction model for State of Connecticut Cannabis Retail Sales¶

This data set contains preliminary weekly retail sales data for cannabis and cannabis products in both the adult-use cannabis and medical marijuana markets. The data reported is compiled at specific points in time and only captures data current at the time the report is generated. The weekly data set captures retail cannabis sales from Sunday through Saturday of the week. Weeks spanning across two different months only include days within the same month. The first and last week of each month may show lower sales as they may not be made up of a full week (7 days). Data values may be updated and change over time as updates occur. Accordingly, weekly reported data may not exactly match annually reported data.

Source Data : https://catalog.data.gov/dataset/cannabis-retail-sales-by-week-ending

Return Home : https://johnkimaiyo.vercel.app/

Creating a prediction model using Python and Pandas involves several steps, including data preprocessing, exploratory data analysis, feature engineering, model selection, training, and evaluation.

Step 1: Import Necessary Libraries¶

In [11]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import joblib

Step 2: Load the Dataset¶

In [2]:
Cannibas_Sales_df = pd.read_csv(r"C:\Users\jki\Desktop\Data Scence Projects\Cannibas Retail Sales\Machine Learning\Source Data\Cannabis_Retail_Sales_by_Week_Ending.csv")

# Display the first few rows of the dataset
print(Cannibas_Sales_df.head())
  Week Ending  Adult-Use Retail Sales  Medical Marijuana Retail Sales  \
0  01/14/2023              1485019.32                      1776700.69   
1  01/21/2023              1487815.81                      2702525.61   
2  01/28/2023              1553216.30                      2726237.56   
3  01/31/2023               578840.62                       863287.86   
4  02/04/2023              1047436.20                      1971731.40   

   Total Adult-Use and Medical Sales  Adult-Use Products Sold  \
0                         3261720.01                    33610   
1                         4190341.42                    33005   
2                         4279453.86                    34854   
3                         1442128.48                    12990   
4                         3019167.60                    24134   

   Medical Products Sold  Total Products Sold  \
0                  49312                82922   
1                  77461               110466   
2                  76450               111304   
3                  24023                37013   
4                  56666                80800   

   Adult-Use Average Product Price  Medical Average Product Price  
0                            44.25                          36.23  
1                            45.08                          34.89  
2                            44.56                          35.65  
3                            44.56                          35.93  
4                            43.49                          34.84  

Step 3: Data Preprocessing¶

Before building the model, you need to preprocess the data. This includes handling missing values, converting data types, and encoding categorical variables if necessary.

In [3]:
# Check for missing values
print(Cannibas_Sales_df.isnull().sum())

# Convert 'Week Ending' to datetime format
Cannibas_Sales_df['Week Ending'] = pd.to_datetime(Cannibas_Sales_df['Week Ending'])

# Extract year, month, and day from the date
Cannibas_Sales_df['Year'] = Cannibas_Sales_df['Week Ending'].dt.year
Cannibas_Sales_df['Month'] = Cannibas_Sales_df['Week Ending'].dt.month
Cannibas_Sales_df['Day'] = Cannibas_Sales_df['Week Ending'].dt.day

# Drop the original 'Week Ending' column
Cannibas_Sales_df.drop('Week Ending', axis=1, inplace=True)

# Display the first few rows after preprocessing
print(Cannibas_Sales_df.head())
Week Ending                          0
Adult-Use Retail Sales               0
Medical Marijuana Retail Sales       0
Total Adult-Use and Medical Sales    0
Adult-Use Products Sold              0
Medical Products Sold                0
Total Products Sold                  0
Adult-Use Average Product Price      0
Medical Average Product Price        0
dtype: int64
   Adult-Use Retail Sales  Medical Marijuana Retail Sales  \
0              1485019.32                      1776700.69   
1              1487815.81                      2702525.61   
2              1553216.30                      2726237.56   
3               578840.62                       863287.86   
4              1047436.20                      1971731.40   

   Total Adult-Use and Medical Sales  Adult-Use Products Sold  \
0                         3261720.01                    33610   
1                         4190341.42                    33005   
2                         4279453.86                    34854   
3                         1442128.48                    12990   
4                         3019167.60                    24134   

   Medical Products Sold  Total Products Sold  \
0                  49312                82922   
1                  77461               110466   
2                  76450               111304   
3                  24023                37013   
4                  56666                80800   

   Adult-Use Average Product Price  Medical Average Product Price  Year  \
0                            44.25                          36.23  2023   
1                            45.08                          34.89  2023   
2                            44.56                          35.65  2023   
3                            44.56                          35.93  2023   
4                            43.49                          34.84  2023   

   Month  Day  
0      1   14  
1      1   21  
2      1   28  
3      1   31  
4      2    4  

Step 4: Exploratory Data Analysis (EDA)¶

Perform some basic EDA to understand the data distribution and relationships between variables.

In [4]:
# Summary statistics
print(Cannibas_Sales_df.describe())

# Correlation matrix
print(Cannibas_Sales_df.corr())

# Plotting the correlation matrix
import seaborn as sns
sns.heatmap(Cannibas_Sales_df.corr(), annot=True, cmap='coolwarm')
plt.show()
       Adult-Use Retail Sales  Medical Marijuana Retail Sales  \
count            1.290000e+02                    1.290000e+02   
mean             2.805301e+06                    1.777271e+06   
std              1.119186e+06                    6.973442e+05   
min              1.639950e+05                    6.283767e+04   
25%              2.005884e+06                    1.458784e+06   
50%              3.154663e+06                    1.818867e+06   
75%              3.781082e+06                    2.365348e+06   
max              4.495102e+06                    3.085787e+06   

       Total Adult-Use and Medical Sales  Adult-Use Products Sold  \
count                       1.290000e+02               129.000000   
mean                        4.582549e+06             71854.674419   
std                         1.560073e+06             30263.936939   
min                         2.268327e+05              4188.000000   
25%                         3.815815e+06             51174.000000   
50%                         5.385123e+06             81333.000000   
75%                         5.599181e+06             96544.000000   
max                         7.290974e+06            120223.000000   

       Medical Products Sold  Total Products Sold  \
count             129.000000           129.000000   
mean            49059.937984        121017.155039   
std             19173.146419         42855.211462   
min              1916.000000          6104.000000   
25%             41914.000000         96853.000000   
50%             51266.000000        140225.000000   
75%             62499.000000        148744.000000   
max             86307.000000        199162.000000   

       Adult-Use Average Product Price  Medical Average Product Price  \
count                       129.000000                     129.000000   
mean                         39.163566                      35.965271   
std                           1.661305                       1.734351   
min                          35.550000                      32.800000   
25%                          38.140000                      34.750000   
50%                          39.080000                      35.650000   
75%                          39.970000                      36.830000   
max                          45.080000                      41.830000   

              Year       Month         Day  
count   129.000000  129.000000  129.000000  
mean   2023.558140    6.325581   18.294574  
std       0.571552    3.531472    9.731074  
min    2023.000000    1.000000    1.000000  
25%    2023.000000    3.000000   10.000000  
50%    2024.000000    6.000000   19.000000  
75%    2024.000000    9.000000   28.000000  
max    2025.000000   12.000000   31.000000  
                                   Adult-Use Retail Sales  \
Adult-Use Retail Sales                           1.000000   
Medical Marijuana Retail Sales                   0.445148   
Total Adult-Use and Medical Sales                0.916391   
Adult-Use Products Sold                          0.985865   
Medical Products Sold                            0.487862   
Total Products Sold                              0.914167   
Adult-Use Average Product Price                 -0.388460   
Medical Average Product Price                   -0.291913   
Year                                             0.396208   
Month                                            0.279192   
Day                                              0.026906   

                                   Medical Marijuana Retail Sales  \
Adult-Use Retail Sales                                   0.445148   
Medical Marijuana Retail Sales                           1.000000   
Total Adult-Use and Medical Sales                        0.766368   
Adult-Use Products Sold                                  0.418163   
Medical Products Sold                                    0.987573   
Total Products Sold                                      0.736314   
Adult-Use Average Product Price                          0.252361   
Medical Average Product Price                            0.253649   
Year                                                    -0.423685   
Month                                                   -0.096965   
Day                                                      0.006971   

                                   Total Adult-Use and Medical Sales  \
Adult-Use Retail Sales                                      0.916391   
Medical Marijuana Retail Sales                              0.766368   
Total Adult-Use and Medical Sales                           1.000000   
Adult-Use Products Sold                                     0.894187   
Medical Products Sold                                       0.791456   
Total Products Sold                                         0.984970   
Adult-Use Average Product Price                            -0.165875   
Medical Average Product Price                              -0.096031   
Year                                                        0.094840   
Month                                                       0.156928   
Day                                                         0.022443   

                                   Adult-Use Products Sold  \
Adult-Use Retail Sales                            0.985865   
Medical Marijuana Retail Sales                    0.418163   
Total Adult-Use and Medical Sales                 0.894187   
Adult-Use Products Sold                           1.000000   
Medical Products Sold                             0.478523   
Total Products Sold                               0.920014   
Adult-Use Average Product Price                  -0.438852   
Medical Average Product Price                    -0.303896   
Year                                              0.393841   
Month                                             0.296591   
Day                                               0.020368   

                                   Medical Products Sold  Total Products Sold  \
Adult-Use Retail Sales                          0.487862             0.914167   
Medical Marijuana Retail Sales                  0.987573             0.736314   
Total Adult-Use and Medical Sales               0.791456             0.984970   
Adult-Use Products Sold                         0.478523             0.920014   
Medical Products Sold                           1.000000             0.784075   
Total Products Sold                             0.784075             1.000000   
Adult-Use Average Product Price                 0.223924            -0.209915   
Medical Average Product Price                   0.159102            -0.143934   
Year                                           -0.364514             0.115209   
Month                                          -0.100276             0.164167   
Day                                            -0.002415             0.016054   

                                   Adult-Use Average Product Price  \
Adult-Use Retail Sales                                   -0.388460   
Medical Marijuana Retail Sales                            0.252361   
Total Adult-Use and Medical Sales                        -0.165875   
Adult-Use Products Sold                                  -0.438852   
Medical Products Sold                                     0.223924   
Total Products Sold                                      -0.209915   
Adult-Use Average Product Price                           1.000000   
Medical Average Product Price                             0.347670   
Year                                                     -0.351466   
Month                                                    -0.555304   
Day                                                      -0.111863   

                                   Medical Average Product Price      Year  \
Adult-Use Retail Sales                                 -0.291913  0.396208   
Medical Marijuana Retail Sales                          0.253649 -0.423685   
Total Adult-Use and Medical Sales                      -0.096031  0.094840   
Adult-Use Products Sold                                -0.303896  0.393841   
Medical Products Sold                                   0.159102 -0.364514   
Total Products Sold                                    -0.143934  0.115209   
Adult-Use Average Product Price                         0.347670 -0.351466   
Medical Average Product Price                           1.000000 -0.619702   
Year                                                   -0.619702  1.000000   
Month                                                  -0.056917 -0.179758   
Day                                                    -0.158841 -0.003103   

                                      Month       Day  
Adult-Use Retail Sales             0.279192  0.026906  
Medical Marijuana Retail Sales    -0.096965  0.006971  
Total Adult-Use and Medical Sales  0.156928  0.022443  
Adult-Use Products Sold            0.296591  0.020368  
Medical Products Sold             -0.100276 -0.002415  
Total Products Sold                0.164167  0.016054  
Adult-Use Average Product Price   -0.555304 -0.111863  
Medical Average Product Price     -0.056917 -0.158841  
Year                              -0.179758 -0.003103  
Month                              1.000000 -0.010315  
Day                               -0.010315  1.000000  
No description has been provided for this image

Step 5: Feature Engineering¶

Feature engineering involves creating new features or transforming existing ones to improve the model's performance.

In [5]:
# Create a new feature: Total Products Sold per Week
Cannibas_Sales_df['Total Products Sold per Week'] = Cannibas_Sales_df['Adult-Use Products Sold'] + Cannibas_Sales_df['Medical Products Sold']

# Display the first few rows after feature engineering
print(Cannibas_Sales_df.head())
   Adult-Use Retail Sales  Medical Marijuana Retail Sales  \
0              1485019.32                      1776700.69   
1              1487815.81                      2702525.61   
2              1553216.30                      2726237.56   
3               578840.62                       863287.86   
4              1047436.20                      1971731.40   

   Total Adult-Use and Medical Sales  Adult-Use Products Sold  \
0                         3261720.01                    33610   
1                         4190341.42                    33005   
2                         4279453.86                    34854   
3                         1442128.48                    12990   
4                         3019167.60                    24134   

   Medical Products Sold  Total Products Sold  \
0                  49312                82922   
1                  77461               110466   
2                  76450               111304   
3                  24023                37013   
4                  56666                80800   

   Adult-Use Average Product Price  Medical Average Product Price  Year  \
0                            44.25                          36.23  2023   
1                            45.08                          34.89  2023   
2                            44.56                          35.65  2023   
3                            44.56                          35.93  2023   
4                            43.49                          34.84  2023   

   Month  Day  Total Products Sold per Week  
0      1   14                         82922  
1      1   21                        110466  
2      1   28                        111304  
3      1   31                         37013  
4      2    4                         80800  

Step 6: Splitting the Data¶

Split the data into training and testing sets

In [6]:
# Define features (X) and target (y)
X = Cannibas_Sales_df.drop(['Total Adult-Use and Medical Sales'], axis=1)
y = Cannibas_Sales_df['Total Adult-Use and Medical Sales']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, X_test.shape)
(103, 11) (26, 11)

Step 7: Model Selection and Training¶

Choose a model and train it on the training data. For simplicity, we'll use a Linear Regression model.

In [7]:
# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)
Out[7]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()

Step 8: Model Evaluation¶

Evaluate the model's performance on the test data.

In [8]:
# Make predictions
y_pred = model.predict(X_test)

# Calculate the Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

# Plot the actual vs predicted values
plt.scatter(y_test, y_pred)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Actual vs Predicted')
plt.show()
Mean Squared Error: 346692.3760245455
No description has been provided for this image

Step 9: Making Predictions¶

You can now use the trained model to make predictions on new data.

In [12]:
# Example: Predict on new data
new_data = pd.DataFrame({
    'Adult-Use Retail Sales': [1500000],
    'Medical Marijuana Retail Sales': [1800000],
    'Adult-Use Products Sold': [30000],
    'Medical Products Sold': [50000],
    'Total Products Sold': [80000],
    'Adult-Use Average Product Price': [40],
    'Medical Average Product Price': [35],
    'Year': [2024],
    'Month': [1],
    'Day': [15],
    'Total Products Sold per Week': [80000]
})

# Save the model to a file
joblib.dump(model, 'cannabis_sales_model.pkl')

predicted_sales = model.predict(new_data)
print(f'Predicted Total Sales: {predicted_sales[0]}')
Predicted Total Sales: 3300001.5161294458

Summary¶

Import Libraries: Import necessary libraries like Pandas, NumPy, and Scikit-learn.

Load Data: Load the dataset into a Pandas DataFrame.

Preprocess Data: Handle missing values, convert data types, and create new features.

EDA: Perform exploratory data analysis to understand the data.

Feature Engineering: Create new features or transform existing ones.

Split Data: Split the data into training and testing sets.

Train Model: Choose a model and train it on the training data.

Evaluate Model: Evaluate the model's performance on the test data.

Make Predictions: Use the trained model to make predictions on new data

In [ ]: